Your database is running fine - until a simple DELETE operation crashes it. What went wrong? While we tend to focus on optimizing SELECT and INSERT operations,
we often overlook the hidden complexities of DELETE. However, removing unnecessary data is equally important. Outdated or irrelevant data can bloat your database,
degrade performance, and make maintenance a nightmare. Even worse, keeping certain types of data without a valid reason can even lead to compliance issues.
At first glance, the DELETE command seems simple. Even the PostgreSQL documentation provides simple examples, for example:
DELETE FROM films WHERE kind <> 'Musical'; DELETE FROM films;
These queries may run effortlessly on your development machine, where there are only a few hundred records. But what happens when you try to run a similar
DELETE in a production environment, where the data set is orders of magnitude larger?
In this article, we will reveal why DELETE operations require careful consideration and explore how to handle them effectively.
What exactly happens when you delete data?
At first glance, a DELETE query seems simple. However, once the query is executed, a series of complex steps occur:
Row Identification: Similar to the SELECT operation, the query identifies the rows that are visible to the current transaction (think MVCC) and checks for locks.
Lock Acquisition: The database acquires row-level exclusive locks to prevent other operations on the target row.
BEFORE DELETE Trigger: If a BEFORE DELETE trigger is defined, it is executed at this point.
Mark Rows as Deleted: Instead of being physically deleted, rows are marked as deleted in the current transaction, making them invisible to future queries (depending on transaction isolation). If the table contains a large number of data objects, it must also involve theTOAST table.
Index update: the corresponding index entries are also marked for deletion (if applicable).
Cascade operations: Performs a cascade operation on related tables, such as ON DELETE CASCADE.
AFTER DELETE Trigger: If an AFTER DELETE trigger is defined, it is executed.
Write-Ahead Logging (WAL): Changes are first logged in the WAL at the row level, followed by index-level updates.
Only after the transaction commits do these changes become permanent and are available for viewing by later initiated transactions. However, even at this point,
the data is not physically deleted. This is where bloat comes in.
Until space is reclaimed by an automated cleanup process or a manual VACUUM operation, “deleted” data remains. This residual data causes bloat, which degrades
query performance over time.
The key question now is whether DELETE is really the most difficult operation our database can handle. The answer? Quite possibly. While UPDATEs are close in
complexity, they are often designed in a way that makes them less challenging:
UPDATE typically modifies only a limited number of columns, reducing the potential number of index updates.
Not all updates trigger a full row (cold)updates, where old rows are marked as dead and new rows are created. You can minimize these situations by carefully designing your tables and queries. For example, it is easier to implement hot (heap tuple only) updates using fixed length columns updates.
Unlike DELETE, UPDATEs do not trigger cascading operations - they only involve explicitly defined triggers.
Next is AUTOVACUUM
When auto-cleanup kicks in (and you do want it to) - usually triggered by a threshold in the number of dead tuples or table changes - it takes a lot of work to clean them up. Let's break it down step by step:
The process starts with a table scan. While it's not always a full table scan, auto-cleanup checks for visibility graphs and pages that may have dead tuples. This can
happen gradually and can even handle large tables - as long as your auto-cleanup settings allow it to run frequently enough.
Each tuple is checked to make sure it is no longer visible for any activity or pending transactions.
Dead tuples that pass the visibility check are physically removed from the table.
The corresponding index entries for the deleted tuples are updated.
The now empty space is marked for reuse in a future INSERT or UPDATE operation.
Table statistics are updated to reflect the current state to help query planners make better decisions.
These changes, including tuple deletions and index updates, are recorded in a prewritten log (WAL) for persistence and replication.
If a table has TOAST data (large objects), the associated TOAST table is processed.
The visibility graph has been updated to mark cleaned pages as fully visible again.
Auto Cleanup resets the threshold to determine when the next cleanup operation occurs.
This process continues until the configured cleanup cost limit is reached (in the case of automatic cleanup), at which point it is paused or stopped. While
AUTOVACUUM helps keep the database under control, it's clear that reclaiming dead tuples is no easy task - which emphasizes why the DELETE operation can have a
lasting impact on database performance.
While this AUTOVACUUM sounds like bad news, without it, your database will soon become bloated with dead tuples, resulting in degraded performance, slower
queries, increased storage usage, and possibly even a disk underrun error because the unused space cannot be reclaimed.
Further consideration
A great deal of work has already been accomplished for the seemingly simple DELETE, but the complexity doesn't stop there.The DELETE operation can present
additional challenges, especially when replication, resource contention, or the size of the operation is involved.
In environments that replicate to hot standby servers or replicas, DELETE operations are more time sensitive. The transaction cannot complete until the corresponding
WAL (prewritten log) record is written to disk on the standby server. This is a basic requirement for maintaining data consistency in a high availability setup, which
usually involves at least one standby server. In addition, if the standby server is actively processing read operations, it must consider a DELETE before acknowledging
the change, which can introduce further delays.
The size of the DELETE operation also plays a critical role. The impact of a small DELETE, such as deleting a row, tends to be minimal. However, as the size of the
operation increases, so does the amount of WAL records generated. Large DELETEs can overwhelm the system, slowing down transactions and straining the
replication process. Standby servers must work harder to process incoming WAL streams, which can become a performance bottleneck if their throughput is
insufficient.
Resource contention adds another layer of complexity, especially for large DELETEs. Generating WAL records, handling regular transactional workloads, and running
background processes can combine to saturate system I/O. This leads to competition for CPU and memory resources, which slows down the overall operation.
Finally, once data has been marked for deletion, the automated cleanup process must eventually step in to physically delete it. This presents a number of challenges,
as the auto-cleanup must deal with the same resource contention and I/O demands, exacerbating the overall impact of the initial DELETE operation.
Soft deletion is not a solution
Soft deletion seems like an easy way to avoid the complexity of traditional DELETE operations. After all, updating the field deleted_at is simple and unlikely to trigger
a cold update. However, soft deletes are not true data deletion mechanisms, and they have some complexity of their own.
While soft deletions can provide an easy way to implement “undo” functionality, they raise serious questions about data consistency. For example, are you marking
only the primary entity as deleted, or are you also cascading the status to all relevant records in the referenced table? If not properly cascaded, the database may be
in an inconsistent state, making it difficult to maintain data integrity.
Soft deletions also need to be considered in your application logic. Every query must include appropriate filters to exclude “deleted” rows, which can complicate
query design and increase the risk of oversights. A missed filter can expose data that should no longer be displayed, leading to potential security or business logic
issues.
Finally, soft deletions don't solve the problem - they just postpone it. The data remains in your database, consuming storage space and potentially causing
performance to degrade over time. Sooner or later, you'll need to deal with the actual deletion of this data, which will leave you with the same challenges that DELETE
brings again.
At the time of this writing, we can only speculate on the extent to which support for temporal PRIMARY KEY and UNIQUE constraints in PostgreSQL 18 will shift the
balance in the future. But given the complexity of the feature, I'm hesitant to draw any conclusions at this time.
The answer is batch processing
When dealing with operations such as DELETE, it is critical that PostgreSQL has time to process and keep up with large-scale changes. The central issue here is the
duration and size of the transaction. The shorter the transaction and the fewer changes it makes, the better PostgreSQL can manage and coordinate those changes.
This principle applies to all database operations and emphasizes the importance of minimizing the impact of individual transactions.
While you can optimize some aspects, such as row identification (using indexing, clustering, or similar techniques), larger data sets require a more strategic
approach - batch processing. For example, deleting a million rows in a single transaction is a textbook thing not to do. Instead, it is more efficient to break the
operation into smaller batches (e.g., deleting 10,000 rows in 100 iterations).
Is this approach faster than performing a mass DELETE operation? Probably not, especially if you include wait time between batches to allow PostgreSQL to process
other workloads. However, the tradeoff is worth it. With batch processing, you give PostgreSQL more breathing room to manage changes without overwhelming
regular transactional workloads-unless, of course, you schedule dedicated maintenance time for the operation.
How to batch delete
The easiest way to perform a bulk DELETE operation is to use a subquery or common table expression (CTE) to limit the number of rows affected in each iteration.
For example, do not perform a bulk DELETE like this:
DELETE FROM films WHERE kind <> 'Musical';
You can break operations into smaller chunks. Using a query like the one below, you can repeat the deletion of rows in a manageable batch (e.g., using \watch in psql to automate the iteration):
DELETE FROM films WHERE ctid IN ( SELECT ctid FROM films WHERE kind <> 'Musical' LIMIT 250 );
The ctid used in this example is a PostgreSQL system column that provides a unique identifier for each row. By using ctid to select values in a subquery, you can limit the number of rows affected in each iteration. This approach is more efficient than LIMIT used directly in the main query because it avoids the need to rescan the table for each batch.
If you don't feel comfortable with ctid (which probably deserves its own post) you can use the regular lookup LIMIT of the primary key sum.
Planning Autovacuum
Using batch processing alone is not a direct solution to the problem of automatically cleaning up catch-up changes. You need to plan for it separately. Adjusting the automatic cleanup settings or triggering a manual VACUUM run VACUUM ANALYZE can help manage the bloat generated during the DELETE process. However, disabling automatic cleanup is rarely recommended unless you have carefully planned for manual maintenance throughout the batch operation. Skipping this step may leave behind performance-impairing bloat, which will require more effort to resolve later.
Delete all data by partitioning
Data that is naturally segmented (e.g. by creation time) makes it the best choice for deletion via partitioning. Partitioning allows you to bypass the DELETE operation altogether by simply deleting or truncating the partition in question. This approach is much more efficient and avoids the overhead of scanning, locking, and marking rows as deleted, effectively eliminating bloat.
While partitioning adds complexity to schema design and query planning, it can provide significant performance benefits for DELETE-heavy workloads, especially when combined with automated partition management.
Conclude
DELETE operations often come with unpleasant surprises - not only can they impact performance and cause bloat, they can backfire when we least expect it. To deal with them effectively, focus on strategies such as batch processing, monitoring automatic cleanup, or utilizing partitioning for large data sets. By considering DELETE operations during architectural design, you can maintain an efficient database with fewer maintenance headaches and ensure that it continues to run smoothly as your data grows.